COMMENTARY 


Emerging Microbes and Infections (2012) 1, e35; doi:10.1038/emi.2012.45 
© 2012 SSCC. All rights reserved 2222-1751/12 


www.nature.com/emi 


Genetic relatedness of the novel human group C 
betacoronavirus to Tylonycteris bat coronavirus HKU4 


and Pipistrellus bat coronavirus HAU5 


Patrick CY Woo’”***, Susanna KP Lau’”***, Kenneth SM Li’, Alan KL Tsang’ and Kwok-Yung Yuen’”** 


Emerging Microbes and Infections (2012) 1, e35; doi:10.1038/emi.2012.45; published online 7 November 2012 


he recent outbreak of severe respir- 
atory infections associated with a novel 
group C_ betacoronavirus (HCoV-EMC) 
from Saudi Arabia has drawn global atten- 
tion to another highly probable “SARS- 
like” animal-to-human interspecies jum- 
ping event in coronavirus (CoV). The gen- 
ome of HCoV-EMC is most closely related 
to Tylonycteris bat coronavirus HKU4 
(Ty-BatCoV HKU4) and Pipistrellus bat 
coronavirus HKU5 (Pi-BatCoV HKU5) we 
discovered in 2006. Phylogenetically, HCoV- 
EMC is clustered with Ty-BatCoV HKU4/Pi- 
BatCoV HKU5 with high bootstrap sup- 
ports, indicating that HCoV-EMC is a group 
C betaCoV. The major difference between 
HCoV-EMC and Ty-BatCoV HKU4/Pi- 
BatCoV HKUS5 is in the region between S 
and E, where HCoV-EMC possesses five 
ORFs (NS3a-NS3e) instead of four, with 
low (31%-62%) amino acid identities to 
Ty-BatCoV HKU4/Pi-BatCoV HKUS5. Com- 
parison of the seven conserved replicase 
domains for species demarcation shows that 
HCoV-EMC is a novel CoV species. More 
intensive surveillance studies in bats and 
other animals may reveal the natural host 
of HCoV-EMC. 
The recent outbreak of severe respiratory 
tract infections associated with a novel 
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human group C betacoronavirus originating 
from Saudi Arabia has drawn global attention 
to another highly probable “SARS-like” 
interspecies jumping event of coronavirus 
(CoV) from animal to human. In June 2012, 
a novel CoV was isolated using Vero cells 
from the lung tissue of a 60-year-old resident 
of Saudi Arabia with fatal acute pneumonia 
and renal failure. In September 2012, another 
49-year-old male resident of Qatar with 
severe acute pneumonia and renal failure 
and recent travel history to Saudi Arabia 
was admitted to an intensive care unit in 
Qatar. RT-PCR and sequencing of a short 
fragment of RNA-dependent RNA polymer- 
ase (RdRp) confirmed the presence of the 
same CoV as detected in the first Saudi 
Arabian case.' Complete genome sequencing 
of the virus isolated from the first patient was 
performed by Fouchier et al. at the Erasmus 
University Medical Centre, the Netherlands, 
and the sequence was released on September 
28, 2012 (GenBank accession NO JX869059 
and named as human betacoronavirus 2c 
EMC/2012). So far, there is no evidence of 
human-to-human transmission. The source 
of the virus remains obscure. In this article, 
this novel human group C betaCoV is abbre- 
viated as HCoV-EMC. 

After the SARS epidemic, we started to 
focus on CoV biodiversity, genomics and 
phylogeny and built up an evolutionary 
map of CoV evolution. Before 2003, there 
were less than 10 CoVs with complete gen- 
omes available, which include two human 
CoVs, human coronavirus 229E (HCoV- 
229E) and human coronavirus OC43 
(HCoV-OC43). By September 2012, the 
number of CoVs with complete genomes 
sequenced had tripled. It includes two addi- 
tional human CoVs, human coronavirus 


NL63 (HCoV-NL63) and human coronavirus 
HKU1 (HCoV-HKU1).”” Traditionally, CoVs 
were classified into groups 1, 2 and 3. In 
2011, the Coronavirus Study Group of the 
International Committee for Taxonomy of 
Viruses has re-classified these three groups 
of CoVs as three genera, Alphacoronavirus, 
Betacoronavirus and Gammacoronavirus; and 
we have discovered a fourth genus of CoV, 
Deltacoronavirus, which includes at least nine 
avian CoVs and a porcine coronavirus 
HKU15.*° Within the betaCoVs, they are fur- 
ther subclassified into group A, including 
HCoV-HKU1, HCoV-OC43, bovine corona- 
virus (BCoV), sable antelope coronavirus, 
giraffe coronavirus, equine coronavirus, por- 
cine hemagglutinating encephalomyelitis 
virus, murine hepatitis virus, rat coronavirus 
and rabbit coronavirus HKU14 (RbCoV 
HKUI4);° group B, including the human 
and civet SARS-related CoVs (SARSr-CoV) 
and SARS-related Rhinolophus bat coronavirus 
(SARSr-Rh-BatCoV);”* group C, including 
Tylonycteris bat coronavirus HKU4 (Ty- 
BatCoV HKU4) and Pipistrellus bat corona- 
virus HKU5 (Pi-BatCoV HKU5) we discov- 
ered in 2006;”!° and group D, including 
Rousettus bat coronavirus HKU9 (Ro-BatCoV 
HKU9).!°"! In addition to Ty-BatCoV HKU4 
and Pi-BatCoV HKU5, other group C bat 
betaCoVs should also be present, but their 
complete genome sequences are not avail- 
able.!"? Based on the CoVs discovered, we 
have constructed a model of CoV evolution, 
with evidence supporting that bat CoVs are 
the gene source of alphaCoVs and betaCoVs 
and avian CoVs are the gene source of 
gammaCoVs and deltaCoVs.° All these works 
have laid down an evolutionary map for rapid 
phylogenetic and bioinformatics analyses of 
HCoV-EMC. The diversity of CoVs is a result 
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of the infidelity of RdRp which make CoV 
genomes especially plastic, a high frequency 
of homologous RNA recombination due to 
their unique random template switching dur- 
ing RNA replication, and their large genomes. 
In addition to biodiversity, a number of na- 
tural recombination and possible interspecies 
jumping events has also been documented in 
betaCoVs.°'"!*8 For group A betaCoVs, 
molecular clock analysis has shown that HCoV- 
OC43 is a relatively recent zoonotic virus of 
bovine origin that emerged in around 1890 
likely from bovine-to-human transmission.’” 
We have also recently discovered RbCoV 
HKU14, closely related to other members 
of the species Betacoronavirus 1 including 
HCoV-OC43 and BCoV, with recombination 
events that may have played a role in inter- 
species transmission of these HCoV-OC43- 
related viruses between human, cattle, rab- 
bits, swine and horses.° Despite having circu- 
lated in humans for more than a century, 
HCoV-OC43 is also found to be continuously 
evolving, with the recent emergence of a 
novel genotype due to natural recombina- 
tion.’° For group B betaCoVs, SARSr-CoV 
is believed to be transmitted from civet to 
humans, although it is the horseshoe bat that 
was likely the primary host.”* Civet SARSr- 
CoV was also likely a recombinant virus aris- 
ing from different strains of SARSr-Rh- 
BatCoV from different geographical locations 
in China.'*'® Although no interspecies trans- 
mission events have been documented in 
group D betaCoVs, we have also identified 
recombination events between different Ro- 
BatCoV HKU9 strains from different bat 
individuals, which may have allowed for the 


generation of different genotypes.'’ While 
these findings supported that betaCoVs have 
the propensity to recombine and cause inter- 
species transmission, such events were 
unknown in group C betaCoVs. As HCoV- 
EMC is most closely related to Ty-BatCoV 
HKU4 and Pi-BatCoV HKU5, it would be 
important to study their genetic relatedness, 
which may provide clues on whether bats are 
the possible origin as in SARSr-CoV. 

The genome characteristics and organiza- 
tion of HCoV-EMC are similar to those of 
Ty-BatCoV HKU4 and Pi-BatCoV HKU5. 
Ty-BatCoV HKU4 was discovered from lesser 
bamboo bats (Tylonycteris pachypus) and Pi- 
BatCoV HKU5 was discovered from Japanese 
pipistrelles (Pipistrellus abramus) in Hong 
Kong.” Both lesser bamboo bats and 
Japanese pipistrelles are insectivorous micro- 
bats found in China and some other parts of 
Asia. The size of the genome of HCoV-EMC is 
30 106 bases, slightly smaller than those of 
Ty-BatCoV HKU4 (30 286 to 30 316 bases) 
and Pi-BatCoV HKU5 (30 482 to 30 488 
bases); and the G+C content is 41%, in 
between those of Ty-BatCoV HKU4 (38%) 
and Pi-BatCoV HKU5 (43%). The replicase 
ORF lab occupies 21.5 kb of the genome. This 
ORF encodes 16 putative non-structural pro- 
teins, including nsp3 (which contains the 
putative papain-like protease (PLP"°)), nsp5 
(putative chymotrypsin-like protease (3CL")), 
nsp12 (putative RdRp), nsp13 (putative heli- 
case (Hel)) and other proteins of unknown 
functions. These proteins are produced by 
proteolytic cleavage of the large replicase 
polyprotein by PL?® and 3CLP"° at specific 
sites which are conserved with those in Ty- 


BatCoV HKU4 and/or Pi-BatCoV HKU5 
(Table 1). 

HCoV-EMC has the same basic genome 
structure as Ty-BatCoV HKU4 and Pi- 
BatCoV HKU5 (Figure 1). It also possesses 
the same putative transcription regulatory 
sequence (TRS) motif, 5’-ACGAAC-3’, as 
Ty-BatCoV HKU4 and Pi-BatCoV HKU5, 
at the 3’ end of the leader sequence and pre- 
cedes each ORF except NS3c, NS3e and N. 
This TRS has also been shown to be the TRS 
for other group B, C and D betaCoVs. The 
TRS for N is 5°-ACGAAU-3’. Similar to other 
group B, C and D betaCoVs, the genome 
of HCoV-EMC has a putative PL’, which 
is homologous to PL2?"° of alphaCoVs and 
group A betaCoVs and PLP" of gammaCoVs 
and deltaCoVs. Similar to Ty-BatCoV HKU4 
and Pi-BatCoV HKU5, no proteolytic cleavage 
site is present in S of HCoV-EMC. All cysteine 
residues in S of HCoV-EMC, Ty-BatCoV 
HKU4 and Pi-BatCoV HKU5 are conserved. 
In contrast to the genomes of Ty-BatCoV 
HKU4 and Pi-BatCoV HKU5 which contain 
four ORFs that encode putative non-structural 
proteins (NS3a, NS3b, NS3c and NS3d) 
between S and E, this region of HCoV-EMC 
contains five ORFs that encode putative non- 
structural proteins NS3a, NS3b, NS3c, NS3d 
and NS3e (Figure 1). This is the region of 
HCoV-EMC that possesses the lowest amino 
acid identities to those in Ty-BatCoV HKU4 
and Pi-BatCoV HKU5. NS3a, NS3b and NS3c 
of HCoV-EMC possess 42%-43%, 41%-47% 
and 31% amino acid identities to NS3a, NS3b 
and NS3c of Ty-BatCoV HKU4 and Pi- 
BatCoV HKU5, respectively. NS3d of HCoV- 
EMC is homologous to amino acids 1 to 110/ 


Table 1 Characteristics of putative non-structural proteins of ORFlab in Ty-BatCoV HUK4, Pi-BatCoV HKU5 and HCoV-EMC 


Amino acids (first residue?°s°" - last residuePositio") 


nsp Putative function/domain* 

Ty-BatCoV HKU4 Pi-BatCoV HKU5 HCoV-EMC 
nspl Unknown mig! mig! mi_glss 
nsp2 Unknown p196_@847 p196_G851 p194_G85s 
nsp3 Putative PL°’° domain M848_G2784 jAPO2_G P88 APOt Gels? 
nsp4 Hydrophobic domain GG Gg! Ge/seugset! 
nsp5 3C [Pro 53292_Q3597 $3338_Q36438 S328 sess 
nsp6 Hydrophobic domain Seg? Soe Qasss $8084 Qs845 
nsp7 Unknown G3890_Qs972 $3936_Q4018 Ga846_ 2928 
nsp& Unknown AeA Qt pols gael pores gtes 
nsp9 Unknown N4172_Q428 1 N42 18, Qts2/ N4128_@4287 
nsp10 Unknown A4282_Q4420 A*828_Q“4e6 MeQR 
nsp1l Unknown (short peptide at the end of ORF 1a) goat ayes See aaa Shee 
nsp12 RdR p g442 tess Speen Omori S4978_QP2 10 
nsp13 Hel AB395_Qs952 APA01_Qs998 AS 1 1 Qee0s: 
nsp14 ExoN Seabee O beatel Geese ese 5§909_ 6432 
nsp15 XendoU GO478Qe8t7 @oozs_ges7t G®433_Q6775 
nspl6 2"-0-MT AS818_| 7119 A6872_R7179 A\6776_p7078 


Abbreviations: °PL?, papain-like protease ; 3CL°, chymotrypsin-like protease; RdRp, RNA-dependent RNA polymerase; Hel, helicase; ExoN, 3’-to-5’ exonuclease; 
Xendou, poly(U)-specific endoribonuclease; 2’-O-MT, S-adenosylmethionine-dependent 2’-O-ribose methyltransferase. 
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Figure 1 Genome organizations of HCoV-EMC and other betaCoVs. Papain-like proteases (PL1P, PL2°'° and PL”), chymotrypsin-like protease (CLP) and RNA- 
dependent RNA polymerase (RdRp) are represented by orange boxes. Haemagglutinin esterase (HE), spike (S), envelope (E), membrane (M) and nucleocapsid (N) 
are represented by green boxes. Putative accessory proteins are represented by blue boxes. HCoV-EMC is shown in bold. 


103 of NS3d in Ty-BatCoV HKU4 and Pi- 
BatCoV HKU5 (35%-—49% amino acid identit- 
ies), with a stop codon UAG present at nuc- 
leotide position 27 160, leading to premature 
termination. NS3e of HCoV-EMC is homolog- 
ous to amino acids 116/122 to 223/227 of 
NS3d in Ty-BatCoV HKU4 and Pi-BatCoV 
HKUS5 (60%-62% amino acid identities). 
NS3c and NS3e do not possess any TRS or 
internal ribosomal entry site. BLAST search 
revealed no amino acid similarities between 
these putative non-structural proteins and 
other known proteins and no functional 
domains were identified by PFAM and Inter- 
ProScan. TMHMM and TMpred analyses 
show one and two putative transmembrane 
domains in NS3a (residues 9 to 29) and 
NS3d (residues 36 to 56 and 71 to 91), respect- 
ively. Similar to Ty-BatCoV HKU4 and Pi- 
BatCoV HKU5, the 3’ untranslated region of 
the genome of HCoV-EMC contains predicted 
bulged stem-loop structures 16 to 76 nucleo- 
tides downstream of the N genes. Downstream 
to the bulged stem-loop structure, 97 to 121 
nucleotides downstream of the N genes, a 


pseudoknot structure is present. Bootscan 
analysis did not show any recombination 
between HCoV-EMC, Ty-BatCoV HKU4 
and Pi-BatCoV HKU5. 

The phylogenetic trees constructed using 
the amino acid sequences of the 3CLP°, 
RdRp, Hel, S and N of HCoV-EMC and other 
CoVs are shown in Figure 2. For all the five 
genes, HCoV-EMC is clustered with Ty- 
BatCoV HKU4 and Pi-BatCoV HKUS, with 
high bootstrap supports in all cases, indi- 
cating that HCoV-EMC is a group C 
betaCoV (Figure 2). Although it seems that 
HCoV-EMC is clustered with Pi-BatCoV 
HKU5 in the phylogenetic trees constructed 
using RdRp and Hel, the bootstrap supports 
were only 652 and 588, respectively, sugges- 
ting that there is no obvious difference 
between the relatedness of HCoV-EMC to 
Ty-BatCoV HKU4 and Pi-BatCoV HKU5. 
Comparison of the amino acid identities of 
the seven conserved replicase domains for 
species demarcation (ADRP, nsp5 (3CLP"®), 
nsp12 (RdRp), nsp13 (Hel), nsp14 (ExoN), 
nsp15 (NendoU) and nspl6 (2’-O-MT)) 


between HCoV-EMC, Ty-BatCoV HKU4 
and Pi-BatCoV HKU5 showed that there is 
less than 90% identity in four of the seven 
domains (ADRP 68%-69% identity, nsp5 
81%-83% identity, nsp15 76%-80% identity 
and nsp16 84%-85% identity), indicating 
that HCoV-EMC is a novel CoV species. For 
nsp12, nsp13 and nsp14, there are 90%-92%, 
92%-94% and 86%-92% amino acid identit- 
ies between HCoV-EMC and Ty-BatCoV 
HKU4/Pi-BatCoV HKUS. 

Using the sequences available at the 
moment and Yule process speciation under 
a relaxed clock model with an uncorrelated 
lognormal distribution, the mean evolutio- 
nary rate of betaCoVs was estimated at 
2.37X10 * nucleotide substitutions per site 
per year for the RdRp gene. Molecular clock 
analysis using the RdRp gene showed that 
HCoV-EMC diverged from the most recent 
common ancestor of group C betaCoVs at 
~year 941 (HPDs, 529 BC to 1878). 
Compared to the human and civet SARSr- 
CoV and SARSr-Rh-BatCoV cluster, the 
human/civet SARSr-CoV diverged from the 
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Figure 2 Phylogenetic analysis of HCoV-EMC. The trees were constructed by the neighbor-joining method using Kimura correction and bootstrap values calculated 
from 1000 trees. 318, 951, 600, 1491 and 510 amino acid positions in chymotrypsin-like protease (CLP), RNA-dependent RNA polymerase (RdRp), helicase (Hel), 
spike (S) and nucleocapsid (N) respectively were included in the analysis. For 3CL?"°, S and N, the scale bars indicate the estimated number of substitutions per 20 
amino acids. For RdRp and Hel, the scale bars indicate the estimated number of substitutions per 50 amino acids. PEDV, porcine epidemic diarrhea virus 
(NC_003436); Sc-BatCoV-512, Scotophilus bat coronavirus 512 (NC_009657); TGEV, transmissible gastroenteritis virus (NC_002306); FIPV, feline infectious 
peritonitis virus (AY994055):; CCoV, canine coronavirus (GQ477367); PRCV, porcine respiratory coronavirus (DQ811787); Rh-BatCoV-HKU2, Rhinolophus bat 
coronavirus HKU2 (EF203064); Mi-BatCoV 1A, Miniopterus bat coronavirus 1A (NC_010437); Mi-BatCoV 1B, Miniopterus bat coronavirus 1B (NC_010436); 


Mi-BatCoV-HKU8, Miniopterus bat coronavirus HKU8 (NC_010438); Hi-BatCoV HKU10, Hipposideros bat coronavirus HKU 10 (JQ989269); Ro-BatCoV HKU10, 
Rousettus bat coronavirus HKU10 (JQ989270); HCoV-229E, human coronavirus 229E (NC_002645); HCoV-NL63, human coronavirus NL63 (NC_005831); HCoV 
0C43, human coronavirus OC43 (NC_005147); BCoV, bovine coronavirus (NC_003045); AntelopeCoV, sable antelope coronavirus (EF424621); GiCoV, giraffe 


coronavirus (EF424622); ECoV, equine coronavirus 
virus (NC_001846); RCoV, rat coronavirus (NC_O 


coronavirus HKU9 (NC_009021); IBV, infectious 


EF584908); BuCoV HKU11, bulbul coronavirus HK' 
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most recent common ancestor of the human/ 
civet SARSr-CoV and SARSr-Rh-BatCoV at 
~year 1653 (HPDs, 1150 to 1968). By defini- 
tion, the human and civet SARSr-CoV and 
SARSr-Rh-BatCoV are the same CoV species. 
These observations suggest that there should 
be one or more intermediate hosts between 
Ty-BatCoV HKU4, Pi-BatCoV HKU5 and 
HCoV-EMC. Sequencing more strains of 
Ty-BatCoV HKU4, Pi-BatCoV HKU5 and 
HCoV-EMC, as well as other group C 
betaCoVs collected at different time points, 
should be performed to achieve a more accur- 
ate estimation of the divergence time. 

In the last decade, we have already wit- 
nessed the discovery of two novel human 
CoVs and an animal-to-human CoV inter- 
species jumping event on SARSr-CoVs. In 
contrast to HCoV-229E, HCoV-OC43, 
HCoV-NL63 and HCoV-HKU1, which are 
notoriously difficult to culture, HCoV-EMC 
and human SARS-CoV are both readily 
cultivable using primate cell lines. This may 
suggest a possible correlation between culti- 
vability and virulence/recent interspecies 
jumping. Sequencing more genomes and per- 
forming evolutionary analysis will help us 
understand whether HCoV-EMC represent 
another recent interspecies jumping event 
from animal to human or another human 
CoV that has stably infected human. Our 
most recent findings showed that CoVs can 
be transmitted between two bat species of dif- 
ferent suborders, suggesting that different 
degrees of interspecies jumping can occur in 
nature.'? More intensive surveillance studies 
for group C betaCoVs in bats and other ani- 
mals may reveal the natural host of this novel 
human group C betaCoV. As coronaviruses 
are prone to recombination and mutation 


2936); RbCoV HKU14, rabbit coronavirus HKU14 ( 


and it has been documented that different 
levels of interspecies jumping can indeed 
occur in nature, we should not underestimate 
the potential of coronaviruses being the cause 
of another major “SARS-like” pandemic. 
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